Xiaoting #554

xiaotinghe · 2025-03-03T14:31:39Z

Fixes #

🤖 AI-Generated PR Description (Powered by Amazon Bedrock)

Description

This pull request includes several changes to the model/etl/code directory. The main changes are:

Removal of the aikits_utils.py file, which was likely deprecated or no longer needed.
Addition of new files: config.py, gpu_config.py, imaug/preprocess.py, model_config.py, postprocess/nms.py, prompt/chart.txt, and prompt/description.txt. These files likely contain new configurations, utilities, and prompt files for the model.
Modifications to existing files: figure_llm.py, imaug/__init__.py, layout.py, main.py, ocr.py, postprocess/__init__.py, prompt/mermaid_template.txt, requirements.txt, sm_predictor.py, table.py, and utils.py. These changes likely include bug fixes, refactoring, or new features related to the model's functionality.
An error with the untitled.txt file, which should be addressed or removed.

No dependencies are explicitly mentioned, but the requirements.txt file has been modified, which may indicate changes to the project's dependencies.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

File Stats Summary

File number involved in this PR: 20, unfold to see the details:

The file changes summary is as follows:

Files	Changes	Change Summary
source/model/etl/code/aikits_utils.py	0 added, 50 removed	This file is removed in this PR
source/infrastructure/lib/user/user-construct.ts	0 added, 1 removed	The code changes involve removing the import of `AdvancedSecurityMode` and updating the import statements for AWS CDK components related to Cognito User Pools.
source/model/etl/code/gpu_config.py	13 added, 0 removed	This code checks for available CUDA GPUs and sets the execution provider, batch size, and layout model accordingly for optimal performance on the available hardware.
source/model/etl/code/imaug/preprocess.py	35 added, 0 removed	This code defines a function to preprocess images for model input by resizing and padding them to a target size while maintaining the aspect ratio, and swapping color channels.
source/model/etl/code/requirements.txt	2 added, 1 removed	The code change adds the openai library with version 1.0.0 or higher to the list of required dependencies.
source/model/etl/code/imaug/init.py	4 added, 0 removed	The code changes import the preprocess function from the preprocess module and add it to the all list for exposing it in the current namespace.
source/model/etl/code/model_config.py	78 added, 0 removed	This code defines configuration dictionaries for different OCR models (Chinese, English, and multilingual), layout analysis, and table recognition, specifying file paths, preprocessing/postprocessing steps, and other parameters.
source/model/etl/code/prompt/chart.txt	16 added, 0 removed	This code provides instructions for converting a chart image into a Markdown table format by carefully observing the structure and data within the image, utilizing context information enclosed in tags, and following guidelines for creating the Markdown table with proper formatting and alignment.
source/model/etl/code/config.py	73 added, 0 removed	This code defines a ModelConfig class to manage model configurations for different languages (Chinese and English) and model types (detection and recognition). It provides methods to retrieve model paths, dictionary paths, and post-processing configurations based on the specified language and model type.
source/model/etl/code/prompt/description.txt	8 added, 0 removed	这段代码定义了一个图像分析任务,要求资深专家清晰描述给定插图的内容细节,包括任何文字,并利用提供的上下文信息来帮助理解和描述,最后将描述写在标签中。
source/model/etl/code/postprocess/init.py	7 added, 1 removed	The code changes include importing the 'nms' and 'multiclass_nms' functions from the 'nms' module, adding them to the 'all' list along with 'postprocess', and removing the 'build_post_process' function.
source/model/etl/code/prompt/mermaid_template.txt	0 added, 1 removed	The code changes remove an empty line and provide examples of mermaid diagram templates with descriptions and codes for visualizing workflows.

flowchart LR
    A[Code Changes] --> B[Remove Empty Line]
    A --> C[Add Diagram Examples]
    C --> D[Visualize Workflows]
``` |
| source/model/etl/code/postprocess/nms.py | 111 added, 0 removed | This code implements Non-Maximum Suppression (NMS) algorithms for single-class and multi-class object detection, along with post-processing functions for model outputs. |
| source/model/etl/code/sm_predictor.py | 23 added, 12 removed | The code changes involve removing the import of `lambda_return` function, adding a `create_response` function to handle Flask responses with CORS headers, modifying the `handler` function to use `create_response` instead of `lambda_return`, and updating the `/transformation` route to use the new response handling approach. |
| source/model/etl/code/layout.py | 17 added, 15 removed | The code changes involve restructuring the imports, introducing configuration files for GPU provider and model settings, and modifying the LayoutPredictor class to use these configurations. Additionally, it adds aspect ratio handling and configurable thresholds for non-maximum suppression and score filtering. |
| source/model/etl/code/utils.py | 107 added, 151 removed | The code changes involve replacing the previous object detection code with new functions for reading images from various sources (URLs, S3, base64), handling different image formats (GIF, PDF), and creating a standardized Lambda function response structure. |
| source/model/etl/code/main.py | 39 added, 10 removed | The code changes introduce concurrent processing of figures using the ThreadPoolExecutor from the concurrent.futures module. It replaces the sequential processing of figures with a parallel approach, where each figure is processed concurrently using a separate thread. This change aims to improve performance by utilizing multiple CPU cores for figure understanding tasks. |
| source/model/etl/code/figure_llm.py | 110 added, 35 removed | The code changes introduce support for using OpenAI models in addition to Bedrock models, add logging, and refactor the codebase to improve modularity and maintainability. The key changes include handling API keys from AWS Secrets Manager, invoking OpenAI models via the OpenAI API, and separating prompts into individual files. |
| source/model/etl/code/table.py | 21 added, 17 removed | The code changes introduce a configuration file (model_config.py) to store model-related settings like session options, preprocessing steps, postprocessing parameters, and table matching configurations. The session options, preprocessing operators, and postprocessing parameters are now loaded from this configuration file instead of being hardcoded. Additionally, file paths are constructed using os.path.join for better cross-platform compatibility. |
| source/model/etl/code/ocr.py | 16 added, 43 removed | The code changes involve the following:

1. Introduced a `get_provider_config` function to determine the execution provider and batch size based on GPU availability.
2. Used `os.path.join` for constructing file paths.
3. Moved model configuration details to a separate `MODEL_CONFIGS` module.
4. Simplified model path construction based on language using `MODEL_CONFIGS`.
5. Consolidated post-processing parameter configuration using `MODEL_CONFIGS`. |

</details>
  

</details>



<details>
<summary>🤖 AI-Generated PR Description (Powered by Amazon Bedrock)</summary>

# Description
This pull request introduces several changes to the project, including the addition of new files, removal of existing files, and modifications to existing files. The primary changes are as follows:

- Added new files for configuration management (`config.py`, `gpu_config.py`, `model_config.py`), image augmentation (`imaug/preprocess.py`), non-maximum suppression (`postprocess/nms.py`), and prompt templates (`prompt/chart.txt`, `prompt/description.txt`).
- Removed the `aikits_utils.py` file.
- Modified existing files for figure processing (`figure_llm.py`), image augmentation (`imaug/__init__.py`), layout management (`layout.py`), main execution (`main.py`), optical character recognition (`ocr.py`), post-processing (`postprocess/__init__.py`), mermaid template (`prompt/mermaid_template.txt`), requirements (`requirements.txt`), prediction (`sm_predictor.py`), table processing (`table.py`), and utilities (`utils.py`).
- Encountered an error with the `untitled.txt` file.

These changes aim to enhance the project's functionality, improve configuration management, introduce image augmentation techniques, implement non-maximum suppression for post-processing, and provide additional prompt templates for figure and description generation.

## Type of change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [x] This change requires a documentation update
## File Stats Summary

File number involved in this PR: *19*, unfold to see the details:

<details>

The file changes summary is as follows:

| <div style="width:150px">Files</div> | <div style="width:160px">Changes</div> | <div style="width:320px">Change Summary</div> |
|:-------|:--------|:--------------|
| source/model/etl/code/aikits_utils.py | 0 added, 50 removed | This file is removed in this PR |
| source/model/etl/code/imaug/preprocess.py | 35 added, 0 removed | This code defines a function `preprocess` that preprocesses an input image for a model by resizing and padding it to a specified input size, and swapping channels if needed. |
| source/model/etl/code/imaug/__init__.py | 4 added, 0 removed | The code changes import the `preprocess` module and add it to the `__all__` list for exposing it publicly. |
| source/model/etl/code/postprocess/__init__.py | 7 added, 1 removed | The code changes introduce imports for nms, multiclass_nms, and postprocess functions, and update the __all__ list to include these functions. |
| source/model/etl/code/requirements.txt | 2 added, 1 removed | The code changes involve adding the 'openai' library version 1.0.0 or higher to the list of required dependencies. |
| source/model/etl/code/config.py | 73 added, 0 removed | This code defines a ModelConfig class that manages configurations for different language models (Chinese and English) for text detection and recognition tasks. It provides methods to retrieve model paths, dictionary paths, and post-processing configurations based on the specified language and model type. |
| source/model/etl/code/prompt/mermaid_template.txt | 0 added, 1 removed | The code changes remove an empty line from the documentation, making the content more concise.

```mermaid
graph LR
A[Code Changes] --> B[Remove Empty Line]
B --> C[Concise Documentation]
``` |
| source/model/etl/code/layout.py | 17 added, 15 removed | The code changes involve restructuring the imports, introducing configuration files for model and GPU settings, and adding configuration parameters for non-maximum suppression threshold, score threshold, image size, and aspect ratio threshold to the LayoutPredictor class. |
| source/model/etl/code/main.py | 39 added, 10 removed | The code changes introduce concurrent processing of figures using ThreadPoolExecutor for figure understanding, with a configurable maximum number of workers. It replaces the sequential processing of figures with concurrent processing, potentially improving performance. Additionally, it logs the time taken for figure processing. |
| source/model/etl/code/prompt/description.txt | 8 added, 0 removed | This code change introduces a task for an experienced image analysis expert to describe the details shown in a given illustration, including any text present, while utilizing the provided context enclosed within <doc></doc> tags to better understand and describe the image. The expert's description should be written within <output></output> XML tags. |
| source/model/etl/code/gpu_config.py | 13 added, 0 removed | The code checks for available GPUs, sets execution providers, recognition batch size, and layout model path accordingly for CPU or GPU execution. |
| source/model/etl/code/model_config.py | 78 added, 0 removed | This code defines configuration dictionaries for a model, layout analysis, and table recognition in an OCR (Optical Character Recognition) system, with settings for different languages, preprocessing/postprocessing steps, and model paths. |
| source/model/etl/code/figure_llm.py | 110 added, 35 removed | This code update adds support for using OpenAI models alongside Anthropic's Bedrock models, retrieves API keys from AWS Secrets Manager, and includes new prompts for chart descriptions and image descriptions. It also adds logging configuration and handles environment variables for OpenAI API base URL. |
| source/model/etl/code/sm_predictor.py | 23 added, 12 removed | The code changes involve refactoring the handler function to create a separate create_response function for generating Flask responses with CORS headers. The lambda_return import is removed, and the handler function now returns the response directly using create_response. The transformation function is also updated to use the new create_response function. |
| source/model/etl/code/utils.py | 107 added, 151 removed | The code changes involve refactoring to add functions for reading images from various sources (URLs, S3, base64) and handling different image formats (GIF, PDF). It also adds a utility function for creating a standardized Lambda function response. |
| source/model/etl/code/postprocess/nms.py | 111 added, 0 removed | The code changes implement non-maximum suppression (NMS) algorithms for single and multi-class object detection, along with post-processing of model outputs for object detection. |
| source/model/etl/code/ocr.py | 16 added, 43 removed | The code changes involve refactoring and modularization of the code. It introduces new modules `gpu_config` and `model_config` to handle GPU configuration and model configurations respectively. The initialization of TextDetector and TextRecognizer classes has been simplified by using configurations from `model_config`. The file paths are now constructed using `os.path.join` for better cross-platform compatibility. Additionally, some redundant code and comments have been removed. |
| source/model/etl/code/prompt/chart.txt | 16 added, 0 removed | This code provides instructions for converting a chart image into a Markdown table based on context information. The key steps involve finding the chart, observing its structure and data, using context to understand it, converting the data into a Markdown table format following specific guidelines, and returning only the Markdown table within XML tags. |
| source/model/etl/code/table.py | 21 added, 17 removed | The code changes introduce a configuration file (model_config.py) to store model settings like session options, preprocessing, postprocessing, and table matching parameters. The session options and postprocessing parameters are loaded from the configuration file instead of being hardcoded. The table matching configuration is also loaded from the config file. |

</details>
  

</details>

- Extract configuration into dedicated config files for models and processing - Add OpenAI API support as alternative to Bedrock for figure understanding - Improve code organization and modularity: - Move preprocessing and utility functions into dedicated modules - Extract GPU configuration into separate file - Centralize model configurations - Add proper error handling and logging - Add concurrent figure processing using ThreadPoolExecutor - Improve code readability and maintainability: - Add comprehensive type hints and docstrings - Remove duplicate code - Standardize response handling - Update configuration management

xiaotinghe force-pushed the xiaoting branch from e4b597d to 8e411a9 Compare March 3, 2025 14:36

xiaotinghe closed this Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xiaoting #554

Xiaoting #554

xiaotinghe commented Mar 3, 2025 •

edited by github-actions bot

Loading

Xiaoting #554

Xiaoting #554

Conversation

xiaotinghe commented Mar 3, 2025 • edited by github-actions bot Loading

Description

Type of change

File Stats Summary

xiaotinghe commented Mar 3, 2025 •

edited by github-actions bot

Loading